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FOREWORD 


i  ne  COMBAT  COMMUNICATIONS  Task  employs  controlled  laboratory  experimentation  in 
studies  designed  to  improve  the  overall  performance  of  personnel  involved  in  tactical  communica¬ 
tions  operations.  Concentrating  for  the  present  on  voice  communications,  the  research  seeks  to 
attain  greator  speed,  accuracy,  and  completeness  in  the  extraction  of  information  from  voice-radio 
and  telephone  media.  Three  primary  objectives  are:  (1)  to  increase  the  efficiency  of  radio-telephone 
communications  in  a  tactical  environment;  (2'  to  enhance  the  performance  of  transcribers  and 
analysts  in  the  extraction  of  information  from  communications  media;  and  (3)  to  develop  improved 
human  factors  techniques  for  tactical  electronic  countermeasures. 

A  previous  study  (TRN  175)  dealt  with  the  ability  of  personnel  untrained  in  communications  to 
rate  their  own  performance  in  receiving  and  transcribing  voice-radio  messages  embedded  in  noise. 
The  present  study  sought  to  determine  whether  operational  communications  personnel  could  rate 
their  performance  with  greater  precision. 

The  research  was  conducted  under  Subtask  b,  "Development  of  improved  work  methods  for 
message  transmission,  reception,  and  transcription",  FY  1967  Work  Program.  In  addition  to 
research  on  confidence  ratings,  studies  are  conducted  to  improve  the  operator's  performance 
through  such  factors  as  redundancy,  repetition,  enhanced  discrimination  of  speech  sounds,  and 
additional  transcription  methods. 


rJ.  E.  UHLANER,  Director 
U.  S.  Army  Behavioral  Science 
Research  Laboratory 


RELATIONSHIP  OF  EXPRESSED  CONFIDENCE  TO  ACCURACY  OF  TRANSCRIPTION 
BY  OPERATIONAL  COMMUNICATIONS  PERSONNEL 


BRIEF 


Requirement: 

To  determine  whether  experienced  communications  operators  are  able  to  rate  their  perform¬ 
ance  in  transcribing  voice  radio  messages  partially  embedded  in  noise  with  sufficient  precision 
for  the  ratings  to  have  potential  operational  utility. 

Procedure: 

Eight  experienced  communications  operators  rated  their  confidence  in  the  accuracy  of  their 
reception  and  transcription  of  messoges  received  at  three  signal-to-noise  ratios  (-6  db,  0  db,  *6  db). 
A  five-point  rating  scale  was  used.  As  a  control,  they  also  transcribed  messages  without  making 
confidence  ratings.  Measures  of  transcript  accuracy  and  expressed  confidence  in  transcription 
obtained  under  the  experimental  conditions  were  compared  with  results  from  a  prior  study  in  which 
the  subjects  were  neither  communications  operators  nor  trained  in  any  communications  procedures 
prior  to  experimental  familiarization. 


Findings: 

The  experienced  communications  operators  were  highly  siccessful  in  judging  the  accuracy  of 
their  transcription,  achieving  a  close  relationship  between  confidence  rating  and  performance 
(r  •  .78),  although  overconfidence  at  the  upper  end  of  the  scale  and  underconfidence  at  the 
lower  end  were  evident. 

Intelligibility  improved  from  20%  to  88%  as  signal-to-noise  ratio  increased. 

The  experienced  communications  operators  performed  better  than  the  non-communications 
trained  subjects  in  the  former  study  both  in  accuracy  of  transcription  and  in  precision  of  confidence 
ratings.  In  neither  study  was  average  accuracy  of  the  transcripts  affected  by  having  subjects  judge 
their  transcription. 

In  both  studies,  subjects  tended  to  make  effective  use  of  less  than  all  five  points  of  the 
confidence  rating  scale. 

Utilization  of  Findings: 

The  practicability  of  obtaining  operationally  useful  expressions  of  confidence  from  transcribers 
was  strongly  supported,  although  the  most  effective  form  for  o  standardized  confidence  rating  pro¬ 
cedure  remains  to  be  determined.  Standardized  ratings  could  assist  communications  analysts  and 
decision  makers,  permitting  them  to  weight  the  transcribed  information  appropriately  and  to  place 
it  in  proper  perspective  with  respect  to  data  from  other  sources. 
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RELATIONSHIP  OF  EXPRESSED  CONFIDENCE  TO  ACCURACY  OF  TRANSCRIPTION 
BY  OPERATIONAL  COMMUNICATIONS  PERSONNEL 


Magnetic  tape  recording  of  incoming  messages  is  standard  procedure 
in  many  different  voice  radio  telephone  communications  operations.  The 
recordings  are  used  in  a  variety  of  ways,  including  re -transmit  taJ.  in 
radio  relay  operations  and  transcription  into  hard  copy  for  subsequent 
analysis  in  decision  making  operations.  When  a  message  is  partially 
masked  by  noise,  it  is  very  difficult  for  the  operator  to  receive  and 
transcribe  the  entire  message  correctly*  Unless  communications  are 
being  jammed,  the  unwanted  noise  tends  to  be  sporadic,  and  the  intelli¬ 
gibility  of  different  sections  of  the  message  varies  inversely  with  the 
amount  of  unwanted  noise.  The  communications  transcriber  often  has 
subjective  impressions  of  confidence  about  the  accuracy  with  which  he  is 
able  to  transcribe  such  partially  masked  messages. 

•Preliminary  research,  using  personnel  without  formal,  training  or 
experience  in  communications,  has  shown  a  positive  relationship  between 
the  transcriber's  confiden-  in  his  correct  reception  and  his  accuracy 
of  transcription  (l).  While  far  from  ideal  for  operational  use,  this 
relationship  was  sufficient  to  warrant  further  research  using  operational 
communications  personnel.  The  existence  of  a  close  relationship  between 
confidence  ratings  of  performance  and  accuracy  of  transcription  among 
experienced  operators  would  be  of  considerable  value  in  the  development 
of  improved  standing  operating  procedures.  The  improved  procedures  could 
be  applied  to  all  communications  operations  where  information  must  be 
transmitted,  extracted,  and  assimilated.  Reliable  measures  of  transcriber 
ability  to  relate  confidence  to  accuracy  also  could  provide  the  communi¬ 
cations  analyst  with  important  time-saving  clues.  Such  measures  could 
afford  objective  estimates  of  the  necessity  for  additional  transcriptions 
of  a  message  received  under  marginal  or  less  than  marginal  listening  con¬ 
ditions  (2).  More  important,  by  establishing  differential  levels  of 
acceptance  for  sections  of  transcripts  on  the  basis  of  the  transcriber's 
confidence  Judgments,  the  analyst  might  be  able  to  extract  more  reliable 
information  from  the  transcript  of  a  partially  masked  transmission. 

The  present  study  dealt  with  the  ability  of  operational  comnunicators 
to  evaluate  their  own  performance  in  extracting  information  from  noise- 
embedded  voice  radio  communications. 


METHOD 

In  an  operational  communications  situation,  the  operators,  monitors, 
and  transcribers  rarely  know  the  listening  conditions  under  which  they 
must  operate  from  moment  to  moment  or  from  message  to  message.  Measure¬ 
ment  of  performance  under  different  signal-to-noise  ratios  was  therefore 
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necessary  to  obtain  information  about  behavior  across  listening  conditions. 
In  the  present  study,  measures  of  two  aspects  of  performance — transcription 
accuracy  and  expressed  confidence  in  the  correctness  o f  the  transcription-- 
were  obtained  at  each  of  three  signal -to-noise  ratios  representing  a  broad 
range  of  listening  conditions.  These  measures  were  analyzed  to  determine 
the  relationship  between  the  confidence  rating  and  transcription  accuracy. 


Experimental  Design 

The  design  was  »Jx2x8  factorial,  three  signal-to-noise  ratios 
constituting  the  first  factor,  two  work  methods  the  second  factor,  and 
eight  enlisted  men  the  third  factor*  In  the  first  work  method,  the  tran¬ 
scribers  assigned  confidence  ratings  to  each  transcribed  word.  In  the 
second  work  method,  no  confidence  ratings  were  made.  Each  man  performed 
under  all  six  combinations  of  factors  one  and  two. 


Subjects 

The  subjects  were  eight  enlisted  men  selected  at  random  from  a  popu¬ 
lation  of  school  trained,  highly  experienced  operational  communicators. 
All  eight  were  in  PUUffiS  hearing  category  1  (supported  by  MAICO  Model  H-l 
Audiometer  hearing  tests).*-'  All  men  had  had  sane  field  experience  in  the 
required  MOB  and  also  experience  in  transcription. 


Stimulus  Material 

The  stimulus  material  consisted  of  the  1,000  phonetically -balanced 
monosyllabic  words  developed  by  the  Harvard  Psycho-Acoustics  Laboratory  (3). 
These  1,000  words  are  divided  into  20  lists  each  consisting  of  3°  word s. 
Five  complete  randomizations  of  the  20  lists,  prerecorded  on  tape,  were 
used.  The  words  in  each  list  were  presented  at  an  intensity  of  approxi¬ 
mately  75  decibels  (0.0002  dynes  per  cm3 ),  one  word  every  four  and  one- 
half  seconds,  at  signal -to-noise  ratios  of  db,  0  db,  and  -6  db.  Each 
word  was  preceded  by  the  carrier  sentence:  "YOU  WILL  TRA  (word)  ." 


Apparatus 

Word  lists  were  reproduced  on  an  Ampex  tape  recorder  (Model  351)  and 
electronically  mixed  (Ampex  MX -35  Mixer)  with  noise  from  a  Bruel  and  KJaer 
Random  Noise  Generator  (No.  1402).  The  mixed  output  was  amplified 
(Macintosh  MO -75)  and  presented  binaurally  through  headphones  (Telex, 
oOO  ohm).  A  double-walled  audiometric  research  sound  booth  was  used  both 
for  training  and  for  data  collection. 


^  Identification  of  instruments  and  materials  is  included  solely  for  pre¬ 
cision  in  reporting  experimental  procedures  and  does  not  constitute  in¬ 
dorsement  of  any  commercial  product  by  the  Department  of  the  Army. 


Work  Methods 


Subjects  listened  to  and  transcribed  word  lists  under  each  of  three 
signal -to-nolse  ratios,  rating  their  confidence  In  the  correctness  of 
each  word  as  they  transcribed  It.  They  also  listened  to  and  transcribed 
the  sane  lists  under  the  same  signal -to -noise  ratios  without  making  any 
expressions  of  confidence.  Transcription  of  a  word  list  while  making  the 
confidence  ratings  was  the  experimental  condition;  transcription  of  the 
list  without  making  the  confidence  rating  was  the  control  condition. 

Order  of  presentation  of  the  two  conditions  was  randomized  to  control 
for  possible  order  effects. 


Confidence  Rating 

Five  categories  of  expressed  confidence  were  used: 

5  I  AM  FULL*  CONFIDENT  THAT  I  RECEIVED  THE  WCRD  CORRECT  LI. 

4  I  AM  SUBSTANTIALLY  CONFIDENT  THAT  I  RECEIVED  THE  WORD  CORRECTLY. 

3  I  AM  MODERATELY  CONFIDENT  THAT  I  RECEIVED  THE  WCRD  CORRECTLY. 

2  I  AM  SLIGHTLY  CONFIDENT  THAT  I  RECEIVED  THE  WORD  CORRECTLY. 

1  I  AM  NOT  AT  ALL  CONFIDENT  THAT  I  RECEIVED  THE  WORD  CORRECTLY. 

Ratings  would  be  completely  accurate  if  all  words  rated  5  were 
correctly  transcribed,  all  words  rated  1  were  incorrect,  arid  half  of  all 
words  rated  3  were  correct,  with  about  three  -fourths  of  all  words  rated 
4  and  one -fourth  of  all  words  rated  2  correct.  Subjects  were  instructed 
to  apply  the  following  concept  in  making  their  ratings:  A  rating  of  5 
was  to  be  assigned  when  the  subject  would  bet  a  large  sum  that  his 
reception  and  transcription  cf  a  word  was  in  fact  correct.  Conversely, 
he  was  to  assign  a  rating  of  1  to  a  word  when  he  would  not  think  at  all 
of  betting  on  its  correctness.  He  was  to  assign  a  rating  of  3  when  he 
.‘-It  that  the  word  was  one  of  two  he  could  have  chosen,  and  ratings  of 
either  2  or  4  when  be  felt  that  his  confidence  fell  midway  between 
categories  3  and  1  and  categories  3  and  3*  respectively. 


^A  more  complete  discussion  of  the  rational  for  this  rating  procedure 
may  be  found  in  the  report  of  the  earlier  research  (l). 
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Procedure 


Subjects  were  trained  in  four  groups  of  two  men  each*".  Training  was  in 
accordance  with  established  procedures  for  speech  Intelligibility  testing  (4). 
Che  and  one  -half  days  of  familiarization,  using  two  of  the  five  randomiza¬ 
tions  of  the  word  lists,  brought  all  subjects  to  approximately  equal  level 
of  familiarity  with  the  stimuli,  with  the  general  transcription  procedures 
required  for  the  experiment,  and  with  the  three  signal -to-noise  ratios. 

An  additional  half  day  of  training  was  devoted  to  familiarization  with  the 
confidence  rating  scale  and  the  experimental  conditions.  Each  pair  of  sub¬ 
jects  was  then  tested  for  twelve  experimental  sessions,  spread  over  three 
days,  using  the  remaining  three  randomizations  of  the  word  lists  as  stimuli. 
Each  session  consisted  of  listening  to  and  transcribing  10  word  lists.  The 
experimental  sessions  were  $0  minutes  in  length.  Each  pair  of  subjects  had 
a  rest  period  of  approximately  one -half  hour  between  experimental  sessions, 
with  a  one-hour  lunch  break  after  the  first  two  sessions  each  day. 

The  measure  of  intelligibility  was  the  mean  percentage  of  words 
correctly  transcribed  for  aid  lists.  This  measure  was  obtained  both  for 
each  of  the  three  signal -to -noise  ratios  and  for  combined  performance 
across  signal -to -noise  ratios.  The  measure  was  obtained  separately  under 
experimental  and  control  conditions. 

The  measure  of  accuracy  obtained  under  the  experimental  condition  was 
the  percentage  of  words  given  any  one  rating  which  were  transcribed  correctly. 


RESULTS 

There  was  a  relatively  high  relationship  between  the  confidence  which 
subjects  expressed  in  the  correctness  of  their  transcripts  and  the  accuracy 
of  received  messages.  Measured  across  subjects  and  the  three  signal-to- 
noise  ratios,  the  coefficient  of  correlation^  between  confidence  and 
accuracy  was  +.78.  At  each  of  the  five  confidence  rating  steps,  mean 
accuracy  scores  were  significantly  different  from  each  other  (p  <  .001), 
and  mean  accuracy  scores  increased  in  linear  fashion  with  the  confidence 
rating.  Summaries  of  these  analyses  are  presented  in  Tablet  A-l  and  A -2 


^Due  to  duty  assignments,  subjects  were  available  only  in  pairs  and  only 
for  one  consecutive  five-day  period,  ’familiarization  time  was  therefore 
shortened,  and  the  number  of  experimental  sessions  per  day  was  doubled 
as  compared  with  the  earlier  research  (l). 

^All  computed  correlation  coefficients  were  tetracharic.  This  measure 
was  obtained  by  collapsing  the  2x5  (right--wrong  x  rating  scale) 
distribution  into  a  2  x  2  (right— wrong  x  high— low  ratings)  distribu¬ 
tion,  splitting  the  rating  array  as  near  the  median  as  possible. 
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of  the  Appendix;  Table  A -3  provides  the  overall  mean  accuracy  of  words 
rated  at  each  step  on  the  confidence  rating  scale.  The  slope  of  the 
linear  regression  of  accuracy  on  confidence  across  signal-to-noise  ratios 
was  .184.  The  slope  of  the  hypothetical  idea]  linear  regression  of 
accuracy  on  confidence  would  he  .25.  The  regression  function  and  the 
mean  accuracy  at  each  step  of  the  confidence  rating  scale  are  shown  in 
Figure  1  for  both  observed  performance  and  ideal  performance.  To  achieve 
the  ideal,  mean  accuracy  scores  would  have  to  be  0 jt,  25^,  75 end 

100ft,  for  confidence  ratings  1  through  3#  respectively. 

Overconfidence  at  the  upper  end  of  the  function  and  underconfidence 
at  the  lower  end  were  observed  (subjects  rated  incorrectly  transcribed 
words  high  and  rated  correctly  transcribed  words  low).  Eighteen  percent 
of  the  words  rated  3  by  all  subjects  were  incorrectly  transcribed,  and 
ten  percent  of  the  words  rated  1  were  received  and  transcribed  correctly. 


Figure  I.  Regression  of  accuracy  on  confidence  across  signal  — to— noise  ratios 
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Although  individual  differences  were  observed  among  subjects,  every 
subject  showed  a  close  relationship  between  confidence  and  accuracy.  For 
each  subject,  mean  accuracy  scores  at  each  of  the  five  confidence  rating 
steps  were  significantly  different  from  each  other  (p  <  .01),  and  these 
means  increased  in  linear  fashion  with  the  confidence  rating.  The  eight 
graphs  in  Figure  2  present  both  mean  accuracy  scores  at  each  confidence 
rating  and  regression  functions  separately  for  each  subject  across 
signal -to-noise  ratios.  Where  mean  accuracy  scores  at  adjacent  confi¬ 
dence  ratings  for  some  subjects  seem  very  close,  significance  was  none¬ 
theless  obtained  because  of  the  substantial  number  of  deterrr  ’  nations  at 
some  rating  steps. 


CONFIDENCE  RATING 

°  Slop*  «J»  projected  from  tour  point*.  Subject  mad*  no  correct  roipon***  rotod  I. 
FIGURE  ?  ftogrouion  ot  accuracy  on  contldonc*  by  (object  ocrot*  oijnol  —  to— noli*  rotia* 
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As  expected,  intelligibility  improved  as  a  direct  function  of  the 
signal-to-noise  ratio,  increasing  from  a  mean  of  approximately  20  percent 
to  appraxiraat  ly  88  percent.  Figure  3  compares  means  obtained  at  each 
signal-to-noise  ratio  under  both  experimental  (rated)  and  control 
(non -rated)  conditions.  Effect  of  signal-to -noise  ratio  on  these  intelli¬ 
gibility  means  was  significantly  different  from  chance  (p  <  .001).  Having 
subjects  assign  confidence  ratings  did  not  significantly  affect  mean 
intelligibility.  Table  A-4  presents  the  means  and  standard  deviations  of 
the  intelligibility  scores,  and  Table  A -5  shows  the  summary  of  the  analysis 
of  variance. 
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Signal  -  tg-  nc>sc  aatio 

Figure  3-  Mean  mtellig  Dility  scores  as  a  "function  of  signal — to— no  .e  ratio 
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Mean  confidence  ratings  also  increased  as  a  direct  function  of 
signal -to-nolee  ratio*  Mean  ratings  at  the  different  signal -to-noise 
ratios  were  significantly  different  from  each  other  (p  <  .001).  Mean 
confidence  ratings  and  significance  of  difference  values  for  these  means 
are  given  in  Table  A -6. 

Since  both  mean  intelligibility  and  mean  confidence  were  signifi¬ 
cant  2y  affected  by  signal -to-nolse  ratio,  the  : results  for  the  three 
s ign&l -to-noise  ratios  were  analyzed  separately.  Correlation  coefficients^ 
between  confidence  and  accuracy  were  +.49,  +.48,  and  +.^3  for  the  -6  db, 
the  0  db,  and  the  -*6  db  signal -to-noise  ratios,  respectively.  At  each 
s ignal -to-noise  ratio,  mean  accuracy  scares  for  the  five  confidence 
rating  steps  were  significantly  different  from  each  other  (p  <  .001). 
Moreover,  at  each  s  ignal -to-nolse  ratio,  mean  accuracy  increased  in  a 
substantially  linear  fashion  with  confidence  rating,  although  a  slight 
curvature  (the  quadratic  component)  was  apparent  at  the  -6  db  s  ignal - 
to-nolse  ratior-;  Summaries  of  these  analyses  are  presented  in  Table  A-7. 

SIGNAL— TO- 
NOISE  RATIO 


CONFIDENCE  RATING 


Figure  4.  Regression  of  accuracy  on  confidence  for  each  signal  to-noise- ratio 

^  See  footnote  4. 

“'This  deviation  form  linearity,  while  not  significant  (.10  <  p  <  .05), 
was  caused  by  underconfidence  at  the  -6  db  listening  condition. 


Separata  plot*  of  accuracy  a*  a  function  of  confidence  at  each  of 
the  three  aignol-to-noise  ratloe  are  shown  In  Figure  4.  The  slope*  of 
the  linear  regressions  describing  performance  at  each  of  the  three 
signal -to -noise  ratloe  were  .156  at  -6  db,  *179  at  0  db,  and  .211  at 
+6  db.  The  slopes  of  the  regression  of  accuracy  on  confidence  were 
significantly  affected  by  the  s ignal-to -noise  ratio  (p  <  .01  )*-;  That 

significant  interaction  is  Itself  linear  can  be  seen  from  Table  A -2. 

None  of  the  subjects  in  this  study  made  effective  use  of  all  five 
steps  in  the  confidence  rating  scale  (Table  A -8).  The  large  number  of 
high  ratings  was  a  result  of  the  relatively  high  intelligibility  at  both 
the  0  db  and  the  *6  db  a  ignal-to -noise  ratios. 

The  sample  of  operational  communications  personnel  in  the  present 
study  clearly  outperformed®-'  the  sample  of  enlisted  men  in  the  earlier 
research  (1)  vho  had  had  no  previous  formal  training  or  experience  in 
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CONFIDENCE  RATINO 


Figurt  5.  Comparison  of  ths  rsgrossions  of  contidsncs  on  accuracy  for  both  somplst  with  th#  idtal  rtgrassion 

^Homogeneity  of  regression  (5). 

*^The  results  were  compared  across  listening  conditions.  The  unpredict¬ 
ability  of  moment -to -moment  noise  interference  with  voice  radio  commu¬ 
nications  in  the  field  makes  the  relationship  between  confidence  and 
accuracy  averaged  over  all  listening  condltiOEis  the  best  practical 
basis  for  prediction. 


either  communications  procedures  or  transcription  techniques.  From  the 
analysis  summary  In  Table  A-9,  performance  means  were  significantly 
different  from  each  other  (p  <  .01)  and  the  regression  of  confidence  on 
accuracy  was  significant  (p  <  .001).  Although  the  two  samples  exhibited 
similar  performance  trends,  a  much  closer  relationship  between  confidence 
and  transcription  accuracy  was  shown  by  the  operational  sample  in  the 
present  study  (correlation  coefficient  of  .78  as  opposed  to  .57).  While 
overconfidence  at  the  upper  end  of  the  rating  scale  and  underconfidence 
at  the  lower  end  were  observed  in  both  studies,  the  magnitude  of  the 
observed  deviations  from  the  ideal  in  the  present  sample  was  consider¬ 
ably  smaller  than  in  the  earlier  sample.  In  the  present  study,  fewer 

than  18%  of  the  responses  rated  5  were  incorrect  (compared  with  32$  in 
the  earlier  study),  and  only  10$  of  the  responses  rated  1  were  correct 
(compared  with  13$).  Differences  between  the  two  samples  in  the  rela¬ 
tionship  between  confidence  and  accuracy  become  even  more  apparent  when 
performance  is  compared  with  an  ideal  where  overconfidence  and  undercon¬ 
fidence  are  both  non-existent  (Figure  5)» 


CONCLUSIONS 

In  spite  of  the  procedural  differences  which  favored  subjects  in 
the  earlier  research— longer  familiarization  period  and  fever  sessions 
per  day— the  sample  of  operational  communications  personnel  in  the 
present  study  outperformed  the  sample  in  the  earlier  study.  Their  formal 
school  training  in  general  communications  procedures  coupled  with  field 
experience  in  voice-radio  message  transcription  under  degraded  conditions 
evidently  enabled  the  operational  communicators  to  "read  through  noise" 
and  transcribe  more  accurately.  While  overconfidence  and  underconfidence 
still  occurred,  the  magnitude  of  such  errors  was  less.  Ability  to  rate 
one '  s  own  performance  on  the  job  would  appear  to  be  directly  related  to 
experience.  It  is  likely  that  the  well -trained  communications  operator 
implicitly  performs  some  type  of  evaluating  while  he  is  transcribing, 
drawing  on  his  past  experience  to  do  so.  The  present  study  provides 
strong  indication  that  trained  operators  can  provide  operationally  use¬ 
ful  confidence  ratings  without  having  their  performance  affected  by  the 
act  of  rating.  Formulation  of  a  standardized  rating  scale  therefore 
becomes  practicable. 

Had  the  additional  familiarization  time  and  the  three  additional 
testing  days  been  available  in  the  present  study,  as  in  the  earlier 
research,  the  operational  communicators  might  have  even  more  closely 
approximated  the  ideal  in  rating  their  performance.  The  decrease  in 
effectiveness  of  the  confidence  rating  as  a  function  of  the  degradation 
of  the  message  might  alfo  have  been  reduced  with  additional  training. 

This  is  especially  important  because,  even  under  less -than -marginal 
listening  conditions,  the  rating  measure  affords  a  valuable  basis  for 
differential  weighting  of  the  rated  portions  of  a  message.  It  can  be 
seen  from  Figure  4  that  mean  accuracy  at  the  -6  db  signal-to-noise 
ratio  varied  from  approximately  6$  at  the  lower  end  to  approximately 
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6t£  at  the  upper  end  of  the  rating  scale,  yet  Intelligibility  at  this 
signal-to-noise  ratio  was  only  20£.  The  Introduction  of  some  type  of 
standardized  rating  scale  in  the  MOB  course  training  could  therefore 
prove  helpful.' 

While  some  question  might  be  raised  regarding  the  potential  deterio¬ 
ration  of  performance  as  a  result  of  the  extra  work  required  In  rating 
each  message  segment  as  it  is  transcribed,  the  data  from  both  the  earlier 
research  (l)  and  the  present  study  argue  strongly  against  this  possibility. 
In  no  case  was  the  performance  using  ratings  significantly  different  from 
its  control  (see  Figure  3  and  Table  A -4  of  this  study  and  the  correspond¬ 
ing  figure  and  table  from  the  earlier  report). 

Table  A -6  and  the  corresponding  table  In  the  earlier  report  reveal 
that  the  majority  of  subjects  utilized  only  three  levels  of  confidence — 
high,  medium,  and  low — in  their  ratings.  These  three  levels  of  confi¬ 
dence  do  not  correspond  to  any  of  the  actual  points  on  the  rating  scale 
itself,  although  the  Inference  is  easy  to  make.  The  actual  ratings  on  . 
the  scale  which  were  effectively  used  varied  among  the  subjects.  Only 
one  or  two  subjects  used  four  scale  points  effectively.  Insufficient 
familiarization  and  training  In  the  use  of  the  ratings,  less  than 
adequate  Instructions  regarding  them,  or  the  short  time  Interval  between 
message  presentations  (three  seconds  from  the  end  of  one  to  the  onset  of 
the  next)  may  have  been  primary  causes.  For  any  or  all  of  these  reasons, 
the  five-point  rating  scale  simply  may  not  be  the  best  type  to  use  In 
transcription  evaluation  of  this  nature.  If  a  standardized  rating  pro¬ 
cedure  Is  to  be  Introduced  Into  the  MOB  course  or  Implemented  in  tbs 
field,  the  significant  determinants  of  rating  effectiveness  must  be  con¬ 
clusively  identified. 

Ultimately,  the  value  of  the  confidence  rating  procedure  far  Imple¬ 
mentation  depends  on  the  minimization  of  errors  of  overconfidence  and 
underconfidence.  The  results  of  the  present  study,  In  comparison  with 
those  obtained  with  the  earlier  sample,  suggest  that  introducing  adequate 
instruction  In  assigning  confidence  ratings  as  part  of  the  formal  MOB 
school  curriculum  would  Improve  the  relationship  between  confidence  and 
accuracy  by  reducing  overconfidence  and  underconfidence ,  and  provide  a 
basis  for  the  meaningful  differential  weighting  of  severely  degraded 
messages.  Implementation  of  reliable  transcriber  confidence  Judgments 
would  provide  communications  analysts  and  decision-makers  with  an  objec¬ 
tive  and  workable  measure  of  the  accuracy  of  transcripts  of  degraded 
messages.  The  procedure  would  assist  the  analyst  by  placing  transcribed 
information  In  the  proper  perspective  and  allowing  the  decision-makers 
to  weigh  this  information  properly  with  respect  to  data  from  other  sources. 
The  overall  result  would  be  a  more  efficient  and  moire  reliable  extraction 
of  Information  r'rcm  noise -embedded  voice  radio-telephone  communications. 
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Table  A-l 


SUMMARY  CF  ACCURACY  SCORE  ANALYSIS  OF  VARIANCE 


Source 

SS 

DF 

MS 

F 

Between : 

Subjects  (A) 

0.2715 

7 

0.0588 

Within: 

Confidence  (B) 

7.5606 

4 

1.8902 

102.175* 

B  by  A 

O.5169 

28 

0.0185 

Signal-to-noise  Ratio 

(c)  1.2562 

2 

0.6281 

55.891* 

C  by  A 

0.2445 

14 

0.0175 

BC 

0.1244 

8 

0.0156 

2.564 

BC  by  A 

0.5579 

54° 

0.0066 

TOTAL 

10.5520 

117 

*F  (4,28)  .001  -  6.25 
bF  (2,14)  .001  -  11.78 
*Two  cells  bad  no  entries. 


’able  A -2 


TREND  ANALYSIS 


(1)  OVERALL  TREND  ACROSS 

SIGNAL-TO-NOISE 

RATIOS 

Source 

SS 

DF 

M3 

F 

Linear  Component 

7-5351 

1 

7.5331 

1141.58* 

Quadratic  Component 

0.0044 

1 

0.0044 

N  3 

Deviation 

0.0251 

2 

0.0116 

N  3 

Error 

0.5579 

54 

0.0066 

*F  (1,54)  .001  =  12.16 

(2)  DIFFERENCES  COMPARING 

SIGNAL-TO-NOISE 

RATIOS 

Source 

SS 

DF 

MS 

F 

Linear  Component 

0.0715 

2 

0.0553 

5-42* 

Quadratic  Component 

0.0410 

2 

0.0205 

N  S 

Deviations 

0.0119 

4 

0.0060 

N  S 

Error 

0.5579 

54 

0.0066 

*F  (2,54)  .01  =  5.0U 
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Table  A-3 


MEAN  ACCURACY  AT  EACH  CONFIDENCE  RATING  ACROSS  SIGNAL-TO-NOISE  RATIOS 


Mean  Accuracy  (percent) 


Rating 


10 


2 6 


46 


66 


82 


Table  A-4 

INTELLIGIBILITY  MEANS  AND  STANDARD  DEVIATIONS  BY 
WCRK  METHOD  AND  SIGNAL-TO-NOISE  RATIO 


S 1 gnal -to -Noise 
Ratio 

Work  Method 

Control 

Experimental 

Mean 

SD 

Mean 

SD 

+6  db 

88.26 

6.52 

88.37 

6.54 

0  db 

62.76 

11.64 

63.^ 

10.37 

-6  db 

20.52 

7.90 

20.19 

7.80 

Table  A-5 

SUMMARY  CF  INTELLIGIBILITY  SCORE  ANALYSIS  CF  VARIANCE 


Source 

DF 

MS 

F 

Subjects  (S) 

7 

275.85 

Work  Method  (w) 

1 

3.15 

N  S 

W  by  S 

7 

8.11 

Signal -to-Noise 

Ratio  (R) 

2 

377,577.85 

9,933.645* 

R  by  S 

14 

58.01 

WR 

2 

1 6.66 

N  S 

WR  by  S  ‘ 

14 

18.60 

TOTAL 

47 

*F  (2,14)  .001  -  11.78 


Table  A -6 

MEAN  CONFIDENCE  RATING  AND  "t"  VALUES 
BY  SIGNAL-TO-NOISE  RATIO 


Signal 

-to-Noise 

Ratio 

1 

2 

3 

Mean  Confidence 

2.19 

5.88 

4.58 

°m 

0.055 

0.029 

0.022 

"t”  values 

1 

— 

10.45* 

16.21* 

2 

— 

5.48* 

*p  <  .001 


20  - 


Table  A-7 

SUMMARY  CF  ACCURACY  SCCRE  ANALYSIS  OF  VARIANCE  BY 
SIGNAL-TO-NOISE  RATIO 


(1)  -6  db  SIGNAL-TO-NOISE  RATIO 

(A)  Analysis  of  Variance 


Source 

SS 

DF 

MS 

F 

Between : 

Subjects  (S) 

0.2319 

7 

0.0331 

Within: 

Confidence  (c) 

1.9655 

4 

0.4913 

49.63* 

S  by  C 

0.2777 

28 

0.0099 

TOTAL 

2.4749 

59 

*F(4,28)  .001  =  6.25 

(B)  Trend  Analysis 

Source 

SS 

DF 

MS 

F 

Linear  Component 

1.9251 

1 

1.9251 

194. 4r* 

Quadratic  Component 

0.0360 

1 

0.0360 

3.64* 

Deviations 

0.0042 

2 

0.0021 

NS 

Error 

0.2777 

28 

0.0099 

*f(i,28)  .001  =  13.50 
bF(l,28)  .10  »  2.89 
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Table  A-7 
(continued) 


(2)  0  db  S IGNAL-TO -NOISE  RATIO 

(A)  Analysis  of  Variance 


Source 

SS 

DF 

MS 

F 

Between : 

Subjects  (S) 

0.0948 

7 

0.0135 

Within: 

Confidence  (6) 

2.5721 

4 

0.6430 

126.08* 

C  by  S 

0.1438 

28 

0.0051 

TOTAL 

2.8107 

39 

*F(4,28)  .001  =  6.25 

(B)  Trend  Analysis 

Source 

SS 

DF 

MS 

F 

Linear  Component 

2.5668 

1 

2.5668 

503.29* 

Quadratic  Component 

0.0004 

1 

0.0004 

US 

Deviations 

0.0049 

2 

0.0024 

re 

Error 

0.1438 

28 

0.0051 

*F(l,28)  .001  =*  13.50 
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Table  A-7 
(continued) 


(3)  +6  db  SIGNAL-TO-NOISE  RATIO 


(A)  Analysis  of  Variance 


Source 

SS 

DF 

MS 

F 

Between : 

Subjects  (S) 

0.1894 

7 

0.0270 

V/ithin: 

Confidence  (C) 

3.1^77 

4 

O.7869 

33-3^* 

C  by  S 

0.6598 

28 

0.0236 

TOTAL 

3.9969 

39 

*F(4,28)  .001  =  6.25 

(B)  Trend  Analysis 

Source 

SS 

DF 

MS 

F 

Linear  Canponent 

3.1126 

1 

3.1126 

131.89* 

Quadratic  Canponent 

0.0089 

1 

O.OO89 

NS 

Deviations 

0.0262 

2 

0.0131 

NS 

Error 

0.6598 

28 

0.0236 

Table  A -8 


KRCEmQE  CF  WORDS  ASSIGNED  EACH  RATING  BY  SUBJECT 
ACROSS  SIGNAL-TO-NOISE  RATIOS 


Rating- 

Subjects 

1 

2 

3 

4 

5 

1 

24.29 

11.73 

12.33 

13.53 

38.13 

2 

18.50 

5.47 

12.63 

22.80 

4o.6o 

3 

13.60 

10.10 

23.33 

9.03 

43. 93 

4 

15.97 

9.57 

18.20 

15.00 

41.26 

5 

1.67 

19.00 

35.63 

25.20 

18.58 

6 

5.43 

23.60 

14.70 

29.67 

26.60 

7 

11.17 

14.63 

14.87 

5.83 

53.50 

8 

9.73 

12.87 

18.70 

14.03 

44.67 

X 

12.54 

13.37 

18.80 

16.89 

38.40 

-  24  - 


Table  A-9 

SUMMARY  CF  ACCURACY  SCORE  ANALYSIS  CF  VARIANCE  COMPARING 
TRAINED  AND  UNTRAINED  COMMUNICATOR  SAMPIES 


(1)  ANALYSIS  OF  VARIANCE 


Source 

SS 

DF 

MS 

F 

Between: 

Groups  (G) 

0.1037 

1 

0.1037 

9-971* 

Error  (a) 

0.1460 

14 

0.0104 

Within : 

Confidence  (c) 

3.9582 

4 

0.9900 

126.923* 

C  by  G 

0.1206 

4 

0.0302 

3.872* 

Error  (b ) 

0.4360 

56 

0.0078 

TOTAL 

4.7645 

79 

*F  (1,14)  .01  -  8.86 
*F  (4,40)  .001  -  5.70 
•f  (4,4o)  .01  -  3.85 

(2)  INTERACTION  TREND  ANALYSIS 

Source 

SS 

DF 

M3 

F 

Linear  Component 

0.1025 

1 

0.1025 

13.141* 

Deviations 

O.OlBl 

3 

0.0060 

N  S 

Error  (b) 


0.4560  56  0.0078 
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